Abstract: The OCR (optical character recognition) is the process of translating the hand written or printed text into a format that is understood by the machine for the purpose of editing, searching and indexing. Preprocessing, segmentation, features extraction, classification and post processing are the main phases of any OCR system and these specific fields are in use today. For all these tasks the segmentation plays a very crucial role in the overall performance of the OCR system. Segmentation can further divided into line, word and character. In this paper, we have discussed different segmentation methods used in various domains. Some of the methods are used for handwritten documents and some of the methods are printed documents. The major focus of this research is to identify the approach that can be segmented into compound and fused character symbols. After the analysis of the existing segmentation methods, we have concluded the favored methods for compound and fused character symbols which are better for the next research. Segmentation is always a frontier area of research in the field of image processing and pattern recognition. There is a large demand for OCR on odia handwritten documents. The objective of this paper is to present a survey of different exiting segmentation methods that have been developed during the last decade. The paper is concluded by suggesting the future aspect of research in this research area.
Keywords: OCR, Text line Segmentation, Word Segmentation, Character Segmentation, Odia Handwritten and Printed documents.